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1  Abstract 


In  the  ninth  quarter  of  the  work  effort,  we  focused  on  a)  conducting  experiments  on  real-world 
data  sets  using  the  developed  algorithms,  b)  design/implementation  of  the  Multiscale  Heat- 
Kernel  Coordinates  (MHKC)  algorithms  and  c)  packaging  for  releasing  the  software  as  open 
source.  This  report  documents  algorithm  designs  for  the  MHKC  algorithms. 

The  project  is  currently  on  track  -  in  the  upcoming  quarter,  we  will  continue  applying  the 
developed  algorithms  to  various  data  sets  and  the  design/implementation  of  the  multiscale  heat 
kernel  coordinates  algorithms.  No  problems  are  currently  anticipated. 
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2  Summary 


In  this  quarter,  we  continued  design  and  implementation  of  the  new  multiscale  heat  kernel 
coordinates  (MHKC)  algorithms.  The  current  design  for  MHKC  algorithms  are  documented  in 
this  report. 

The  project  is  currently  on  track  -  in  the  upcoming  quarters,  we  will  continue  applying  the 
developed  algorithms  to  various  data  sets  and  focus  on  the  design  and  development  of  the 
MHKC  algorithms.  No  problems  are  currently  anticipated. 


Use  or  disclosure  of  data  contained  on  this  sheet  is  subject  to  restrictions  on  the  title  page  of  this  report. 
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3  Introduction 


The  primary  project  effort  over  the  last  quarter  focused  on  completing  design/development  of  the 
multiscale  heat-kernel  coordinates  algorithms  Error!  Reference  source  not  found..  This 
provides  a  power  tool  for  discovering  the  non-linear  geometries  in  any  given  dataset.  This 
utilizes  the  fast  randomized  Singular  Value  Decomposition  (RSVD)  algorithms  described  in  the 
earlier  ONR  reports  [7]  [8].  Use  of  the  RSVD  effectively  reduces  the  computational  complexity 
from  O(m.n.k)  to  0((m+n).k2)  for  an  m  by  n  matrix  of  rank  k.  In  contrast  to  the  multiscale 
Singular  Value  Decomposition  (MSVD)  algorithms  that  detect  linear  structures  in  data  at 
multiple  scales,  the  MHKC  uses  heat  kernels  to  discover  the  non-linear  manifold  structure  in 
which  the  data  resides  at  various  scales.  Similar  to  the  MSVD,  the  MHKC  provides  an  efficient 
representation  using  low-dimensional  coordinates  corresponding  to  the  original  data  points. 


Use  or  disclosure  of  data  contained  on  this  sheet  is  subject  to  restrictions  on  the  title  page  of  this  report. 
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4  Methods,  Assumptions  and  Procedures 


4.1  Multiscale  Heat  Kernel  Coordinates 

The  Multiscale  Heat  Kernel  Coordinates  (MHKC)  algorithms  are  based  on  theoretical  results 
presented  in  [1],  The  current  algorithm  design  is  described  below. 

Input:  A  set  of  n  data  points  in  Rd.  Assume  n  is  large. 

Step  1:  The  first  step  comprises  constructing  the  data  matrix  to  be  provided  as  input  to  the 
RSVD  algorithm.  Define  the  heat  kernel  as 

k(x,y)  =  exp(-  IL'-\il  /  to) 

for  any  two  points  x  and  y.  Here,  to  is  a  constant  (data  dependent)  representing  the  kernel  window 
size.  The  heat  kernel  matrix  is  then  defined  as 


K={kl}  |  where  k-l}  =  k(x j,  jq) 


for  i,  j  =  1,2 The  transition  probability  matrix  is  P  =  D  lK  where  D  is  the  diagonal  matrix 
with  the  z-th  entry  as  sum  of  the  z-th  row  of  K. 

Note  that  P  is  not  symmetric.  There  are  various  techniques  to  symmetrize  P  such  that  the 
eigenvalues  and  eigenfunctions  are  still  easy  to  compute.  One  way  is  to  define 

P’  =  Dm.P.Dm 

P'  is  symmetric  with  the  same  eigenvalues  as  P.  Also,  the  eigenvectors  can  easily  be  easily 

i/2  1/2 

obtained  using  a  simple  transformation  of  either  D  or  D  .  The  RSVD  is  used  to  compute  the 
spectrum  of  P\ 

Step  2:  Next,  the  heat  kernel  coordinates  is  defined  for  each  of  the  original  data  points.  Let  the 
eigenvalues  of  P  be  defined  as  A,  and  the  right-eigenvectors  as  Vj  for  j  =  1,2,. .  .,rank(R). 

Each  point  Xi  is  then  represented  as  HKC(x\)  =  (  exp(-k\l).v\  i,  expi-kity.v-n,  ...,  expi-k^.v^  ) 
where  vji  is  the  z-th  coordinate  of  the  eigenvector  vj.  Here  t  is  the  time/scale  parameter  that  is  to 
be  varied  to  look  at  the  geometries  of  the  data  set  at  various  scales. 

Note-1:  The  first  eigenvalue/eigenvector  of  P  is  trivial  and  should  not  be  used. 

Note-2:  A  subsequent  SVD  may  be  applied  to  heat-kernel  coordinates  matrix  for  mapping  the 
points  to  the  space  of  their  3  principal  components  for  quick  visualization. 
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4.2  Deliverables  /  Milestones 


Date 

Deliverables  /  Milestones 

Status 

Oct  2010 

Progress  report  for  period  1,  1st  quarter 

Jan  2011 

Progress  report  for  period  1,  2nd  quarter  /  complete  randomized  matrix  decompositions  task 

Apr  2011 

Progress  report  for  period  1,  3rd  quarter  /  complete  approximate  nearest  neighbors  task 

Jul  2011 

Progress  report  for  period  1,  4th  quarter  /  complete  experiments  -  part  1 

Oct  201 1 

Progress  report  for  period  2,  1st  quarter 

Jan  2012 

Progress  report  for  period  2,  2nd  quarter  /  complete  multiscale  S  VD  task 

Apr  2012 

Progress  report  for  period  2,  3rd  quarter 

Jul  2012 

Progress  report  for  period  2,  4th  quarter  /  complete  experiments  -  part  2 

✓ 

Oct  2012 

Progress  report  for  period  3,  1st  quarter 

Jan  2013 

Progress  report  for  period  3,  2nd  quarter  /  complete  multiscale  Heat  Kernel  task 

Apr  2013 

Progress  report  for  period  3,  3rd  quarter 

Jul  2013 

Final  project  report  +  software  +  documentation  on  CDROM  /  complete  experiments  -  part  3 

Use  or  disclosure  of  data  contained  on  this  sheet  is  subject  to  restrictions  on  the  title  page  of  this  report. 
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5  Results  and  Discussion 


An  important  issue  with  the  MHKC  algorithm  described  earlier  is  to  ascertain  the  right 
time/scale  parameter(s)  for  any  given  dataset.  The  idea  is  to  provide  automated  techniques  to 
help  the  analyst  determine  these  parameters.  For  the  representation  problem  of  characterizing  the 
various  operational  phases  of  a  system,  we  are  investigating  metrics  that  can  be  used  to  assess 
the  quality  of  the  clustered  representation  at  various  time  scales.  This  may  help  to  quickly  narrow 
down  the  search  for  time  scales  which  exhibit  the  various  local  geometries  in  the  dataset. 
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Page  5 


ArrLItu  - - — - 

Communication 

SCIENCES 


ISRN  TELCORDIA-2012-09+PR-0GARAU 
Technical  Progress  Report 
Conclusions 


6  Conclusions 


The  project  is  on  track  with  design/implementation  of  the  new  multiscale  heat  kernel  coordinates 
algorithms.  We  will  continue  with  algorithmic  improvements  and  experimentation  using  the 
developed  algorithms  in  the  next  quarter. 

No  problems  are  currently  anticipated. 
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